Skip to content

Conversation

@martin-augment
Copy link
Owner

19830: To review by AI

Replace usages of str::to_lowercase().as_str() + patterm matching with
str::eq_ignore_ascii_case(&str) to avoid allocating temporary String.
@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

Walkthrough

This pull request refactors string comparison logic across multiple files to use case-insensitive matching via eq_ignore_ascii_case instead of to_lowercase-based comparisons. Changes occur in file type detection, object store mode parsing, print options handling, and date part extraction. The modifications replace match expressions and equality checks that relied on lowercasing strings with direct case-insensitive comparisons. All changes preserve existing behavior and control flow patterns while altering implementation approaches.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
datafusion/functions/src/datetime/date_part.rs (1)

250-253: Inconsistent: is_epoch() still allocates via to_lowercase().

This function still uses to_lowercase() which allocates, inconsistent with the PR's goal. Consider updating it to use eq_ignore_ascii_case() as well.

♻️ Suggested fix
 fn is_epoch(part: &str) -> bool {
     let part = part_normalization(part);
-    matches!(part.to_lowercase().as_str(), "epoch")
+    part.eq_ignore_ascii_case("epoch")
 }

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9cedd2b and 5ec88b5.

📒 Files selected for processing (4)
  • datafusion-cli/src/exec.rs
  • datafusion-cli/src/object_storage/instrumented.rs
  • datafusion-cli/src/print_options.rs
  • datafusion/functions/src/datetime/date_part.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: Analyze (rust)
🔇 Additional comments (4)
datafusion-cli/src/object_storage/instrumented.rs (1)

122-130: LGTM!

The refactoring from match with to_lowercase() to eq_ignore_ascii_case() correctly avoids the temporary String allocation while preserving the same case-insensitive matching behavior. The existing tests at lines 843-853 already validate mixed-case inputs like "dIsABleD", "SUmMaRy", and "TRaCe".

datafusion-cli/src/exec.rs (1)

401-410: LGTM!

The case-insensitive file extension matching using eq_ignore_ascii_case() is correct and avoids unnecessary allocations. File extensions are ASCII-only, making this an appropriate choice.

datafusion-cli/src/print_options.rs (1)

51-54: LGTM!

Clean refactoring that avoids allocating a lowercase string for each comparison. The chained eq_ignore_ascii_case() calls correctly handle the three keywords ("inf", "infinite", "none") for unlimited rows.

datafusion/functions/src/datetime/date_part.rs (1)

217-231: LGTM!

The refactoring correctly replaces the match on a lowercased string with case-insensitive comparisons, avoiding allocation while preserving behavior for all date part variants.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Jan 15, 2026

Code Review: PR #185 - Don't allocate to compare string slices

Summary

This PR optimizes string comparisons by replacing to_lowercase().as_str() pattern with eq_ignore_ascii_case(), eliminating unnecessary heap allocations. This is a solid performance improvement for hot path code.


✅ Strengths

  1. Performance Improvement: The change eliminates unnecessary allocations when comparing strings case-insensitively. to_lowercase() creates a new String on the heap, while eq_ignore_ascii_case() performs comparison in-place.

  2. Consistent Optimization Pattern: All four modified files follow the same optimization pattern, making the changes uniform and easy to review.

  3. Functionally Equivalent: For ASCII-only comparisons (which these file extensions and SQL keywords are), eq_ignore_ascii_case() is semantically identical to to_lowercase().as_str().


🔍 Code Quality Observations

1. Code Readability Trade-off

The original match statement pattern was more compact and idiomatic Rust:

match ext.to_lowercase().as_str() {
    "csv" => Some(ConfigFileType::CSV),
    "json" => Some(ConfigFileType::JSON),
    ...
}

The new if-else chain is more verbose but avoids allocation:

if ext.eq_ignore_ascii_case("csv") {
    Some(ConfigFileType::CSV)
} else if ext.eq_ignore_ascii_case("json") {
    ...
}

Recommendation: This trade-off is acceptable for performance-sensitive code. Consider adding a brief comment in hot-path functions explaining the optimization.

2. Pattern Consistency Across Codebase

I found 88 occurrences of to_lowercase() across 39 files in the codebase. Notable remaining instances:

  • datafusion/functions/src/datetime/date_trunc.rs:88 - Similar pattern could be optimized
  • datafusion/common/src/format.rs:174, 223 - Multiple match statements
  • datafusion/common/src/config.rs:1772 - Config parsing

Recommendation: Consider filing a follow-up issue to audit and optimize other instances, particularly in hot paths.


🧪 Test Coverage

Existing Coverage: ✅ Good

  • The date_part function has extensive test coverage in datafusion/sqllogictest/test_files/datetime/date_part.slt
  • Tests include case variations: EXTRACT(quarter ...), EXTRACT("quarter" ...), EXTRACT('quarter' ...)
  • CLI file type parsing is tested through integration tests

Gap: The specific case-insensitive behavior is implicitly tested but not explicitly documented.

Recommendation: No additional tests needed - existing tests adequately cover the functionality. The optimization doesn't change behavior.


⚡ Performance Considerations

Positive Impact:

  1. Eliminates heap allocations in string comparison operations
  2. Reduces memory pressure and GC overhead (though Rust doesn't have GC, less allocation = less work for the allocator)
  3. Cache-friendly: In-place comparison is more CPU cache-efficient

Magnitude:

  • For individual calls: Small but measurable improvement (saves one allocation per comparison)
  • For hot paths (like date_part in query processing): Can add up significantly in tight loops over large datasets

Benchmark Recommendation: For date_part.rs:217-230, consider adding a microbenchmark if this is in a hot path, though the improvement is straightforward enough that it's likely not necessary.


🔒 Security Considerations

No security concerns identified.

The eq_ignore_ascii_case() method is safe and has identical behavior to to_lowercase() for ASCII characters. All the strings being compared (file extensions, SQL keywords) are ASCII-only.

Note: If any future strings contain non-ASCII characters, eq_ignore_ascii_case() and to_lowercase() may behave differently. However, for the current use cases (file extensions: "csv", "json", "parquet"; SQL parts: "quarter", "epoch", etc.), this is not a concern.


🐛 Potential Issues

None identified. The changes are straightforward and correct.


📝 Minor Observations

  1. Whitespace change in date_part.rs:57: Line 57 had trailing whitespace removed. ✅ Good cleanup.

  2. Error message preservation: In instrumented.rs:130, the error message still correctly uses the original string {s}, not a lowercased version. ✅ Correct.

  3. Return statement formatting: The semicolon placement in date_part.rs:232 is correct - it terminates the expression statement that creates arr.


✨ Recommendations

  1. ✅ Approve and Merge: This is a clean optimization with no functional changes.

  2. Consider Follow-up: Create an issue to audit other to_lowercase().as_str() patterns in performance-sensitive areas:

    • date_trunc.rs:88
    • format.rs:174, 223
    • Other datetime functions
  3. Documentation: If this is part of a broader performance optimization effort (ref: issue Don't allocate to compare string slices apache/datafusion#19830), consider documenting this pattern in contributing guidelines to prevent regression.


Final Verdict: ✅ LGTM (Looks Good To Me)

This PR makes a solid performance improvement without sacrificing correctness. The code changes are clean, consistent, and well-scoped.


🤖 Generated with Claude Code

@augmentcode
Copy link

augmentcode bot commented Jan 15, 2026

🤖 Augment PR Summary

Summary: Removes temporary string allocations when doing case-insensitive comparisons in several parsing paths.

Changes:

  • Replaced to_lowercase()-based matching with eq_ignore_ascii_case for config file extension detection in the CLI
  • Updated object store instrumentation mode parsing to avoid allocating while matching keywords
  • Adjusted max-rows parsing (inf/infinite/none) to use allocation-free comparisons
  • Updated date_part special-case handling to use allocation-free comparisons for supported parts

Technical Notes: Uses eq_ignore_ascii_case (ASCII-only case-insensitive comparison), which is appropriate for these ASCII keywords/extensions and avoids creating intermediate lowercase strings.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants